Model Selection

Multimodal Inference

# Multimodal Inference

Qwen2.5 VL 32B Instruct FP8 Dynamic

An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.

Transformers English

Gemma 3 27b It FP8 Dynamic

This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.

Transformers English

Mistral Small 3.1 24B Instruct 2503 Quantized.w4a16

This is an INT4-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized and released by Red Hat (Neural Magic), suitable for fast-response dialogue agents and low-latency inference scenarios.

Safetensors Supports Multiple Languages

Qwen2.5 VL 7B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM

Transformers English

Qwen2.5 VL 3B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.

Transformers English

Pixtral 12b FP8 Dynamic

pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.

Safetensors Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase